#multi-page web tasks05/06/2025
WebChoreArena: Pushing AI Web Agents Beyond Simple Browsing with Complex Memory and Reasoning Tasks
WebChoreArena benchmark introduces complex memory and reasoning tasks to better evaluate AI web agents, revealing significant challenges for current models beyond simple browsing.